skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Verma, Aayushi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We introduce the UConn Bubbles with Swatches dataset. This dataset contains images of voting bubbles, scanned from Connecticut ballots, either captured as grayscale (8 bpp) or color (RGB, 24 bpp) artifacts, and extracted through segmentation using ballot geometry. These images are organized into 4 groups of datasets. The stored file contains all data together in color and we manually convert to greyscale. Each image of a bubble is 40x50 pixels. The labels are produced from an optical lens scanner.  The first dataset, Gray-B (Bubbles), uses 42,679 images (40x50, 8 bpp) with blank (35,429 images) and filled (7,250 images) bubbles filled in by humans, but no marginal marks. There are two classes, mark and nonmark. The second dataset, RGB-B, is a 24 bpp color (RGB) version of Bubbles-Gray.  The third dataset, Gray-C (Combined), augments Gray-B with a collection of marginal marks called “swatches”, which are synthetic images that vary the position of signal to create samples close to the boundary of an optical lens scanner. The 423,703 randomly generated swatches place equal amounts of random noise throughout each image such that the amount of light is the same. This yields 466,382 labeled images. The fourth dataset, RGB-C, is a 24bpp color (RGB) version of Gray-C. The empty bubbles are bubbles that were printed by a commercial vendor. They have undergone registration and segmentation using predetermined coordinates. Marks are on paper printed by the same vendor. These datasets can be used for classification training. The .h5 has many levels of datasets as shown below.  The main dataset used for training is positional.  This is only separated into blank (non-mark) and vote (mark).  Whether the example is a bubble or a swatch is indicated by batch number.  See https://github.com/VoterCenter/Busting-the-Ballot/blob/main/Utilities/LoadVoterData.py for code that creates torch arrays for RGB-B and RGB-C. See the linked Github repo (https://github.com/VoterCenter/Busting-the-Ballot/blob/main/Utilities/VoterLab_Classifier_Functions.py) for grayscale conversion functions and other utilities.   Dataset structure: COLOR - POSITIONAL - INFORMATION / / / B/V/Q B/V/Q COLOR/POSITIONAL / / / IMAGE IMAGE B/V/Q / BACKGROUND RGB VALUES Images divided into 'batches' not all of which have dataInformation contains labels for all images. Q is the swatch data, while B and V are non-mark and mark respectively. 
    more » « less
  2. Free, publicly-accessible full text available November 19, 2026